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Abstract 

We consider min{/(A") : g(x) < 0, x 6 X}, where X is a compact convex subset of M m , and / 
and g are continuous convex functions denned on an open neighbourhood of X. We work in the 
setting of derivative-free optimization, assuming that / and g are available through a black-box 
that provides only function values for a lower-^ 2 representation of the functions. We present a 
derivative-free optimization variant of the £-comirror algorithm 0. Algorithmic convergence 
hinges on the ability to accurately approximate subgradients of lower-^ 2 functions, which we 
prove is possible through linear interpolation. We provide convergence analysis that quanti- 
fies the difference between the function values of the iterates and the optimal function value. 
We find that the DFO algorithm we develop has the same convergence result as the original 
gradient-based algorithm. We present some numerical testing that demonstrate the practical 
feasibility of the algorithm, and conclude with some directions for further research. 

Keywords: convex optimization, derivative-free optimization, lower- ^ 2 , approximate subgradi- 
ent, Non-Euclidean projected subgradient, Bregman distance. 
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1 Introduction 

In this paper we introduce a derivative-free linear interpolation-based method for solving con- 
strained optimization problems of the form 

(P):mm{f(x):g(x)<0,xeX}, (1.1) 

where / and g are continuous convex functions defined on a nonempty open convex subset O of 
M. m , and where the constraint set X is a nonempty compact convex subset of O. We further assume 
that we have access to the lower-^ 2 representations of / and g and that the problem is feasible 
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i.e., there exists some xq e X such that g(xo) < 0. The algorithm is based on the e-comirror algo- 
rithm presented in Derivative-free optimization (DFO) is a rapidly growing field of research 
that explores the minimization of a black-box function when first-order information (derivatives, 
gradients, or subgradients) is unavailable. While the majority of past work in DFO has focused on 
unconstrained optimization, several methods have recently been introduced for constrained opti- 
mization. In constrained optimization, most of the analysis of DFO methods has been done within 
the framework of direct search and pattern search methods. That is, methods that do not attempt 
to build interpolation (or other such) models of the objective function, but instead use concepts 
like positive bases to ensure convergence. Such methods can be adapted to constrained optimiza- 
tion through techniques by e.g. projecting search directions onto constraint sets ffT71[T6l . "pulling 
back" search directions onto manifolds lfT3l[14ll . the use of filtering techniques [1 J, or barrier based 
penalties 0. 

On the other hand, fairly little research has explored approaching constrained optimization via 
model-based DFO methods. Notable in this area is [|23ll2ll, which extends the UOBYQA J20l to 
constrained optimization (in an algorithm named CONDOR). This paper provides a novel model- 
based DFO method for linearly constrained optimization. Our algorithm is designed for constraints 
defined by a given convex function. 

Our algorithm is based on the e-comirror algorithm [3J. The e-comirror algorithm finds its 
roots in mirror-descent methods [fT9l l5ll4ll. These methods can be viewed as nonlinear projected 
subgradient methods that use a general distance-like function (the Bregman distance) instead of the 
usual Euclidean squared distance [4]. The e-comirror algorithm adapts the mirror-descent method 
to work for convex constrained optimization where the constraint set is provided by a convex 
function. It requires that the problem is additionally constrained by a convex compact set and that 
the subgradients (of both the constraint function and the objective function) are bounded over this 
set. 

The algorithm presented here differs from previous research in two other notable ways. First, 
unlike past model-based DFO method, we do not assume that the objective function is ^ 2 ; instead, 
we work with the broader class of lower- ^ 2 functions (see definition 12.11) . Lower- ^ 2 functions 
include convex [|22l Theorem 10.33] and c € 2 functions (by definition), as well as fully amenable 
functions 11221 Exercise 10.36] and finite max functions (Example l2.3l below). To work with lower- 
c € 2 functions, we develop a method to approximate subgradients for such functions and analyze it 
for the derivative-free algorithm. In particular, in Theorem 13 .31 we define the approximate subgra- 
dient for an arbitrary lower-^ 2 function and prove that it satisfies an error bound analogous to the 
one introduced in [8, Theorem 2.11] for the class of c <o l functions. 

The second major difference from previous DFO research is that we present a convergence 
result that quantifies the difference between the function values of the iterates and the optimal 
function value. To the best of our knowledge, this provides the first results of this kind for a 
multivariable DFO method. It is remarkable that the DFO algorithm we develop has the same 
convergence result as the original gradient-based algorithm presented in (A quadratically 
convergent DFO method is developed in lfT5l . but only for functions defined on E. Furthermore, 
in lfT8l . a superlinearly convergent algorithm is presented.) 

The remainder of this paper is organized as follows. Section [2] is a brief introduction to the 
main building blocks we use. First, we provide the definition of the class of lower- ^ 2 functions 
and some properties. Second, we provide the definition of the linear interpolation model of a 
function / over a subset Y of M m and a sufficient condition to be well-defined. Finally, we give 
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the definition and the main properties of Bregman distances. In Section [3] we give the first key 
result in Theorem 13.31 on which we build our convergence results. In Section |4] we describe our 
derivative-free e— comirror algorithm. In Theorem 14.31 we establish the convergence analysis. In 
Section |5] we provide some numerical results that confirm the practical feasibility of the algorithm. 
Section [6] summarizes some concluding remarks. To make the presentation self-contained we add 
Appendix lAl which includes the proofs of two basic inequalities. 

2 Auxiliary Results 

We shall work in W n , equipped with the usual Euclidean norm | ■ | . Throughout the remainder of 
the paper, we suppose that 

O is a nonempty open convex subset of IR m . 
Recall that for a convex function / : O — >• K., the subdifferential df at a point x G O is defined by 

df(x) = {ve R m : f(y) > f(x) + (v,y-x) for all y G 0} . (2.1) 
We denote the closed ball in R m centred at xq with radius A > by 

B(x ;A) = {xeR m : \x-x \ < A}, 
and the set of natural numbers by 

N = {1,2,3,...}. 
Given r 6 N, we abbreviate the unit simplex in W by 

5 r :={Aer:f^ = Ue[0,l],/e{l,... ) r}}. 
(=i 

Finally, we shall use \L\ to denote the spectral norm of a matrix L G 



The Class of Lower- ^ Functions 

We next introduce the class of lower- "if 2 functions. 

Definition 2.1. [22, Definition 10.29] A function f : O — > M. is said to be a lower- function at 
x G O if there exists a neighbourhood V = V (x) CO and a representation 

f(x)=m a xf(x) (2.2) 

in which all functions f are of class ^ on V, the index set T := T(x) is a compact topological 
space, and f and the first k derivatives of f depend continuously not just on x G V but even on 
(t,x) eT xV. In this case we say that (12.21) provides a lower-^ representation off at xE O. The 
function f is said to be lower- ^ k on O if f is lower-^ at every point x G O. 

The next Lemma provides details regarding when a convex function is lower- c £ 1 . 
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Lemma 2.2. (221 Theorem 10.33] Letf:0->Rbe convex. Then f is lower-^ 2 on O. 

Although the class of lower- ^ 2 functions includes many convex functions [22, Theorem 10.33], 
it should be noted that our algorithm will require access to a lower- ^ 2 representation of the ob- 
jective and constraint functions. The next example shows that any finite max function is not only 
lower- c to 2 , but also provides a natural lower- ^ 2 representation. 

Example 2.3. Let f : O — >■ R be defined as f = max {f\ ,...,/„}, where each /) is of class ^ k on 
O. Then f is lower-^ on O. (This is the case where T is {1, . . . ,n} equipped with the discrete 
topology.) 

The value of working with lower- ^ 2 functions is seen in Lemma [2~4l which demonstrates how 
to compute the subdifferential of a lower- ^ 2 function. 

Lemma 2.4. Let f : O — > R be a convex function that has a lower-^ 2 representation f[x) = 
max/ f (jc) atx E O and set A(x) = argmax/ r (^). Then 

df(x)= com {Vf(x)\tEA(x)}. 
Proof. Combine [22] Theorem 10.31] and Proposition 8.12]. □ 

Theorem 2.5. [|22l Proposition 10.54] Let f : O — > R be a lower-ff 2 function, and let X be a 
nonempty compact subset of O. Then there exists an open set O' with X C O' C O, such that 
f has a common lower-^ 2 representation valid at all points x E O ', i.e., there exists a compact 
topological space T, and a family of functions (f)teT defined on O' such that 

f = maxf on O ', (2.3) 

and the functions (t,x) h-> f(t,x), (t,x) i— )■ V f(t,x), and (t,x) i-> V 2 f(t 1 x) are continuous on T x O'. 

To prove convergence of the algorithm introduced in this paper, we require bounds on the 
subgradients of the objective and the constraint functions. Lemma [2761 provides a proof of the 
existence of this bound. 

Lemma 2.6. Let f : O — >■ R be convex, and let X be a nonempty compact subset of O. Then 

sup|d/(X)| < +oo. 

Proof. Since / is convex, Lemma 12.21 implies that / is lower- ^ 2 on O. Since X is a nonempty 
compact subset of O, Theorem 12 . 5 1 guarantees the existence of an open subset O' with X CO' C O 
such that / has a common lower- ^ 2 representation valid at all points x E O' . Let / = max fe 7-/ r 
be as stated in Theorem 12.51 The definition of lower- ^ 2 implies that the mapping (t,x) h-> |V/j(x) | 
is continuous on T x O ' . By the Weierstrass Theorem, L := va^^ t ,x)eT xx\^ ft( x ) \ < 00 • Now, let 
x G X, and let v E df(x). Using Lemma [2741 we know that v = Y.teA(x) ^t^ft( x ) f° r some A E S r 
where r E N is the number of elements in A[x). Therefore 

£ WrW < I W,(*)|< £ U = L, 



feA(jc) feA(x) 



and the proof is complete. (Alternatively, one may consider either the lower semicontinuous hull 
of / and apply [|2T1 Theorem 24.7], or use j|22l Corollary 12.38] after extending df to a maximally 
monotone operator.) □ 
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Lemma 2.7. Let f : O — >■ R £>e a lower- c € 2 function, and let X be a nonempty compact convex 
subset of O. Let O' , T, and (f)teT be as in Theorem \2.5\ Then there exists Kf>0 such that Vf t is 
Kf-Lipschitz on O' for every t 6T. 

Proof By Theorem l2.5[ (t,x) t-t V 2 f(x) is continuous on the compact set T xX. Therefore, by the 
Weierstrass theorem, Kf := vna.X( t x } eTxX \\V 2 f t (x) \\ < +°°. Now apply the Mean Value Theorem 
|fl2l Theorem 5.1.121. 

□ 

The Linear Interpolation Model 

In our method we use a derivative-free model-based technique. Therefore, in this section we 
introduce the definition of the linear interpolation model and related facts. 

Definition 2.8. Let f : O — > R be a function, and let Y = (yo, yi,...,y m )e W nx ( m+1 \ If the matrix 





(1 


yo,i ■ 




Q = 


l 




■ y\,m 






V 1 


y m ,i ■ 


ym,m J 



is invertible, then Y is said to be a poised tuple centred at yo. Moreover, if {yo,yi, ■ ■ ■ ,y m } Q O 
then Y is said to be a poised tuple centred at yo with respect to /. In this case the linear system 



(l 


yo,i ■ 


■ yo, m ^ 


( cco\ 


ff(yo)\ 


1 


yiA ■ 


■ y\, m 




/Cyi) 




y m .\ ■ 


ym,m J 


\oc m J 


\f{ym)J 



has a unique solution (ob, oti , . . . , OC m ) G ^ mx ( m + l ) > an d the Linear Interpolation Model of the 
function f over Y is the unique (well defined) function 

n 

F:R m ^R: x^ a + £ <kxi. 

!=1 

Note that in this case F satisfies the interpolation conditions 

F(yi) = /(y/), far every i G {0, 1, . . . ,m} . 

The following Theorem provides the error bound satisfied by the approximate gradient of the 
linear interpolation model. 

Theorem 2.9. jH Theorem 2.11] Suppose that f : O ^function on O. Let yo G O. Assume 

thatY = (yo,yi,---,y m ) £ R wx ( m+1 ) is a poised tuple of sample points centred at yo with respect to 
f. Set A = max |y/ — yo|. Suppose that B(yo;A) C O. Let Vf be Kf Lipschitz over B(yo;A). Then 

\<i<m 

the gradient of the linear interpolation model F satisfies an error bound of the form 

\Vf(y)-VF(y)\<KA, forall yGfl(y ;A), 
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where 



K:=K f (l + ^\L- l \/2), L = L(Y) 



( yi -yo\ 
yi-yo 

\y m -yoJ 



and L = L(Y) :- 



(2.4) 



The Bregman Distance: Definition and Properties 

The last building block used in our analysis is the Bregman distance. 

Definition 2.10. [6] Let co : O — >■ R be a convex differentiable function. The corresponding Breg- 
man distance D a is 

D m : OxO ^R: (u,v) h+ co(u) -CO(v) - (Vfi>(v),u-v). (2.5) 

Definition 2.11. [26, Section 3.5] Let C be a nonempty convex subset ofW n . Let CO : C — Y R Then 
CO is said to be strongly convex with convexity parameter a > 0, if for all x,y G C, t G [0, 1] we 
have 

co(tx+(l -t)y) < tco(x) + (l -t)(0{y)-—t{\ -t)\x-y\ 2 . 

Throughout the next arguments we shall assume that ft) is a strongly convex and differentiable 
function on a nonempty convex subset of W n , with a convexity parameter a > 0. In this paper we 
shall be interested in Bregman distances that are created from strongly convex functions. 

The following result is part of the folklore (and established in much greater generality in e.g., 
E6l Section 3.5]); for completeness we include the proof. 

Lemma 2.12. Let (0 : O — >■ R be a differentiable function. LetX be a nonempty subset of O. Then 
the following are equivalent: 

(i) (0(Xx+ (1 -X)y) < Xa(x) + (l-X)a(y) - %X(l-X)\x-y\ 2 forallx,yeXandX G ]0, 1[. 

(ii) D a (x,y) = C0(x) - co(y) - (V(o(y),x-y) > ^\x-y\ 2 for all x,y G X and X G ]0, 1[. 

(iii) (Vg)(jc) - Vo)(v),jc- y) > a\x-y\ 2 for allx,y GX and X G ]0, 1[. 
Proof. "dB^dn])": Rewrite © as 



Hence 



C0{y + X(x-y)) < X(D(x) + (l-X)(D(y)-^X(l-X)\x-y\ 2 . 



co(y + X(x-y)) -co(y) ^ , . . . a,. ... l2 
^ < (0(x)-(D(y)--(l-A)|x-y| 2 . 



(2.6) 



Taking the limit as X — > + and using the assumption that CO is differentiable we see that 

(Vco(y),x-y) < co(x) - co(y) - -\x-y\ 2 . 



Hence (El) holds. 
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"©=*►©". Suppose that © holds for all x,y G X. Let A E ]0, 1[. Set z = Ax + (1 - A)y e X. 
Applying © to x and z yields 

Q)(z) < (o(x)-(V<o(z),x-z)--|x-z| 2 . (2.7) 
Similarly, applying © to y and z yields 

a>(z) < co(y) - (Vco{z),y- z)--\y- z| 2 . (2.8) 
Multiplying (12771) by A and (HTSl) by (1 - A), and adding we get 

©(z) < Aco(x) + (1 - A)<o(y) - A(V©(z),x-z) - (1 - A)(Va)(z),y-z) 
-|(A|x-z| 2 + (l-A)|y-z| 2 ). 

Notice that x — z = (1 — X) (x — y) and y — z = A (y — x) . Thus, substituting in the last inequality we 
get 

co(z) < Xco(x) + (1 - X)a{y) - A(Vco(z),(l - A)(x-y)) - (1 - A)(V<o(z),A(y-x)) 
_|[A(l-A) 2 |x-y| 2 + (l-A)A 2 |x-y| 2 ] 

= Afi)(x) + (l-A)fi)(j)-A(l-A)(Va)(z),x-};}+A(l-A)(V©(z),x-y) 

_«A(l-A)((l-A)|x-y| 2 + A|x-j| 2 ) 

= Aco(x) + (l-A)co(y)-|A(l-A)|x-y| 2 . 

Substituting for z = Ax + (1 — X)y gives ©. 

"©^(EUl)". Suppose that © holds Vx, y£X. Then we have 

0)(x) - G)(y) - (Vo)(y),x-y) > | |x-y| 2 , (2.9) 

0)(y)-fi)(x) + (Va)(x),x-y) > -|x-y| . (2.10) 

Adding (12791) and (12.101) we get dm]). 

''(lmW©''. By the fundamental theorem of calculus we have for t G ]0, 1[ 

fi)(x) - co(y) = f (V(0(y + t(x—y)),x-y}dt. 
Jo 

Subtracting (Vft)(y),x — y), noting that Jq {V(o(y) 1 x — y)dt = (Vfi)(y),x — y) and using (iii) we get 

G)(x) -(0{y) - (Va)(y),x-y) = / CV(o(y + t(x-y)) -Va)(y),x-y)dt 

Jo 

f l 1 

= / -{V(o(y+t(x-y))-'V(o(y),t[x-y)}dt 
Jo t 

r 1 1 7 

> / -a\t(x-y)\ dt 
Jo t 

= a|x-v| 2 / -t 2 dt 
Jo t 

a \ |2 
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which completes the proof. 



□ 



Following J3), we give the definition of the Bregman diameter of an arbitrary set X. 

Definition 2.13. Let CO : O — >■ R Z?e a convex differentiable function. Let X be a nonempty subset 
of O. The Bregman diameter of the set X is defined as 



In the following lemma we prove that, if ft) is differentiable and strongly convex, then the 
Bregman diameter is finite for every compact subset of R"\ 

Lemma 2.14. Let CO : O — > R be a differentiable convex function. Let X be a nonempty compact 
subset of O. Then D m is bounded on X xX. Consequently, the Bregman diameter of the set X is 
finite. 

Proof. Since ft) is convex and differentiable, therefore ft) is continuously differentiable on O ET1 
Corollary 25.5.1]. Thus, ft) and V ft) are continuous onX, and therefore D a is continuous onX xX. 
Now, X x X is a nonempty compact subset of R m x R m , and therefore D a is bounded on X x X 
and the Bregman diameter of the set X is finite. □ 

3 Functional Constraints and Assumptions 

Recall that we are interested in the general convex problem of the form 



In the sequel, we shall consider the following assumptions on /, g and X. 

Al / : O — > R and g : O — >• R are continuous convex functions. 
A2 X is a nonempty compact convex subset of O, and X is not a singleton. 
A3 We have access to lower- ^ 2 representations (see Theorem 12 .5 1 ) of / and g on some open subset 
O' of O such that X C O' and 

/ = max/ ; and g = maxg f on O' . 
A4 The set of optimal solutions of problem (P) is nonempty. 

Remark 3.1. Under Assumption Al, the functions f and g are lower- c € 1 functions on O (by 
Lemma \2.2\) . Assumption A3 provides the stronger statement that we have access to lower-^ 2 
representations of these functions. 

Lemma 3.2. Suppose that Assumptions Al and A2 hold. Then 



= sup{D £0 (M, v) :u,ve X}. 



(2.11) 



(P):min {f(x):g(x)<0 1 xeX}. 



(3.1) 



L/:=sup||d/(X)|| <+°o and L g := sup \\dg(X)\\ < +°o. 



(3.2) 



Proof. Combine Remark ISTTl Assumption A2, and Lemma l2T6T ii). 



□ 
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In the following Theorem, we give an error bound for the approximate subgradient. 

Theorem 3.3. Suppose that Al, A2, A3, and A4 hold. Let Y = (yo,yi, . . . ,y m ) G M. mx ( m + l ) be a 
poised tuple of sample points centred atyo <EX with respect to f. Set A = max |y,- — yo\. Suppose 

\<i<m 

that 5(vo;A) C X. Let y G B(y ;A). Let (ti, ...,t r ) G A(y) r , and X G S r , where r G N. Define 
V = V(y) := A ; VF f; (y). Then there exists v G df(y) such that the following error bound 
holds: 

\V-v\ <K f {\ + s/m\L~ X \/2) A, 
where Kf is as in Lemma \Z77\ and L = L(Y) is as defined in Theorem \2.9\ 

Proof. By assumption V = Y!i=\ ^F ti {y). Lemma I2~4l implies that v = v(y) := Y!i=\ fflftSy) G 
df(y). Using the triangle inequality, the error bound given in Theorem 12.91 (applied to O' instead 
of O) and Lemma [2771 we have 

|V-v| = |£A,(VF,(y) -V/,(y))| < £ A ; |VF,(y) - V/,(y)| 

7=1 7=1 

r 

< £ Af£/( 1 + v 7 ^!^ 1 1 /2) = K f ( 1 + v^lL- 1 1 12) A, 

7=1 

as claimed. □ 

Our next corollary relates Theorem 13.31 to the algorithm presented later. Let us note that the 
function E in Corollary 13.41 is the same as the one used in the algorithm. We also note that, 
although in Corollary 13.41 we provide the error bound for the approximate gradient function in a 
general format, in practice we shall use x = yo. 

Corollary 3.4. Suppose that Al, A2, A3 and A4 hold. Let Y = (yo,yi, • • • ,ym) be a poised tu- 
ple of sample points centered at yo G X with respect to f. Set A = max \yi — yo| and suppose 

l<i<m 

that 5(yo;A) CI For every x G S(yo;A), let (t\,...,t r r x \) &Af(x) r<yX \ A G S r i x \, G 
A g (x)*\ I G S- r{x> 

v/(x) = £ Wf ti (x) G V/W = £ A ; VF,W, 

7=1 7=1 

r(x) r(x) 

VgW = £ ^V#. (*) G = £ X,VGg(x), 



7=1 7=1 



and 



and 



Then: 



) •/'■•■'• ifg(x)<£, 
otherwise, 




E(x) :={'')"(> if8(x) - £ (3.4) 
" ' 1 otherwise. 
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(i) The following error bound holds 

\e{x) -E(x)\ < k A, for all xeB(y ;A), (3.5) 

where K = max{Kf,K g }(l + \fm\LT 1 1/2), .fiTy w defined as in Lemma \277\ and K„ is obtained 
by replacing f by g in Lemma \277\ and L is as defined in Theorem \2.9\ 
(ii) The function E induced by (13.41) satisfies 

\E{x)\ <max{L/,L g } + JcA, for all xG5(y ;A), (3.6) 

where Lf and L g are defined as in Lemma 13721 

Proof, (i): Use (13.31) and ( 13.41) . and apply Theorem |3~3l to / and g. (ii): Let jcGX. Using the triangle 
inequality, (13.21) , and (13.51 ) we have |J£(jc)| < \e(x) \ + \e(x) —E(x) \ < max {L^,L g } + k A. □ 



4 Algorithm and Discussion 

In this section we introduce the Derivative-Free £— CoMirror algorithm and present a convergence 
analysis. 



The Derivative-Free £-CoMirror algorithm (DFO e CM) 
Initialization Input 

• xq e x, 

• Me R++. 

General step for every k e { 1 , 2, . . .} 

• Select 

0<A,<^=L=. (4.1) 

• Select a poised tuple Y k = (yo,yi, . . . ,y m ) centred at yo with respect to / such that 
the set {y ,)>i, . . . ,y m } C B(x kl A k ), x k = y and |L fc 1 1 < M, where = is as 
defined in Theorem 12 .91 

• Set 

x k+ i = argmin{(t k E k - V co(x k ),x) + (0(x)}, (4.2) 

xex 

where 

Ejk . = /^). *8( Xk )<e; (43) 
ly g (xfc), otherwise, 

\/@a 

* = Nvr (4 ' 4) 

and where a > is the strong convexity parameter of the strongly convex function 
Co : O — > R, is the corresponding Bregman diameter of the set X, and Vf and V g 
are defined as in Corollary 13 .41 
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Remark 4.1. 

(i) In generating the points of the tuple Y k C ]R mx ( m + 1 ) we need to check that \L7 \ < M. If this 
inequality fails, then we resample. It is always possible to generate the tuple Y k for all k EN 
provided thatM is set to be sufficiently large [25 J. For a detailed discussion on how to choose 
M we refer the reader to J9]|. 

(ii) The poised tuple Y k = (yo, y\ , . . . , y m ) must satisfy max \yi — x k \ <A k to guarantee that the 

ie{l,...,m} 

error bound in Theorem \3.3\ still holds true. This does not create a conflict (i) because by 
the definition of the matrix L in (12.41) . the value of |L _1 1 remains unchanged under scaling or 
shifting. 

(iii) The update of x k in (|4.2I) is well defined, since that the function (t k E k — Vfi>(xfc), •) + (O is 
strongly convex and differentiable over X, and therefore it has a unique minimizer over X. 

(iv) The step length t k is well defined for all k e {1 , 2, . . .} except when E k = in which case either 
we have a local minimum, or we change the search radius A k to get a better approximation 
of the gradients. Moreover, the Bregman diameter is finite by Lemma \2.14\ Finally, by 
Lemma \2.12\ (ii). we have that D a {x : y) > — y\ 2 , and therefore, since X is not a singleton, 
the Bregman diameter is strictly positive. 

(v) In general, the Bregman diameter 15 not easy to calculate. However, if the set X is 
simple and the function (0 is separable, calculating becomes simpler. For example, if 
X = [a h Pi] x • • • x [a m , p m ] and co(x) = Zf=\ °h{xi)> then ® = Zf=\ D ah{ok, AO- 

4.1 Convergence Analysis 

We devote this subsection to study the convergence of the algorithm. Lemma 1431 and its proof are 
only a minor adaptation of (31 Lemma 2.2]. For the sake of completeness, we include the adapted 
proof. 

Lemma 4.2. Let (xk) ke ^ be the sequence generated by DFO £ CM. Let i < j be two strictly positive 
integers. Then for all k G {1,2,...} 

Y<t k {E k ,x k - M ) < + 1- £ t 2 k \E k \\ (4.5) 

k—i k=i 

for every u&X. 

Proof. By the optimality condition in (14.21) we have 

(t k E k — V G){x k ) + Vd)(x k+ i),u — Xjt+i) > for every u EX. 

Hence, 

t k (E k ,u-x k+1 ) > (Vfi)(xjO -V6)(x k+ i),u-Xk+i) for every u eX. (4.6) 
The three-point property of the Bregman distance [7> Lemma 3.1] tells us 

D a (u,x k+ i) -D a (u,x k ) +D a) (x k+l ,x k ) = (Vco(x k )-Vco(x k+ i),u-x k+ i). (4.7) 
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Combining (1431 ) and (14771 ) yields 

t k (E kl u-x k+ i) > D a (u,x k+ i) -D m (u,x k ) +D m (x k+ i,x k ). 

That is 

t k {E k ,x k+1 -u) < D m (u,Xk)-D a (xk+i,Xk) -D a (u,Xk+i). 

Adding t k (E kl x k — x k+ \) to both sides of the above inequality and using Lemma [2.1 21 (if) and the 
Cauchy-Schwarz inequality we get 

t k {E k ,x k -u) < D(o{u,x k ) -D m (u,x k+1 ) -D a (x k+ \,x k ) +t k (E k ,x k -x k+1 ) 

oc 

<D(o{u,x k ) -D(o{u,x k+ i)-—\x k -x k+ \\+t k \E k \\x k -x k+ \\. 

Notice that, t k \E k \ \x k — x k+ \ \ — % \x k — x k+ \ | 2 is a quadratic function of \x k — x k +\ | that has a maxi- 
mum value of jat%\E k \ 2 , i.e., t k \E k \\x k -x k+ i\ - § \x k -x k+ i\ < j^t 2 \E k \ 2 . This yields 

t k (E k ,x k -u) < D a {u,x k ) -D m (u,x k+ i) + — t k \E k \ . 

Summing the last inequality over k E {i, i + 1, . . . , j} we obtain 

j j i 

Y,t k (E k ,x k -u) <D C0 (u,x i )-D 0) (u,Xj + \) + Y*2Z f l\ E k\ ■ 

k=i k=i 

Using the definition of © we note that D m (u,Xj) — D a (u 7 Xj + \) < 0, from which we get (14.51) . 

□ 

The following theorem presents the efficiency estimate for the Derivative-Free e— CoMirror 
method. In proving Theorem 14.31 we are motivated by the techniques used in the proof of 01 
Theorem 2.1]. Given n G N, we denote the set of indices of the £— feasible solutions among the 
first n iterations by 

I^ = {ke{l,2,-,n}:g(x k )<£}. 

Theorem 4.3. Suppose that Assumptions Al, A2, A3 and A4 hold. Let £ > and let {x k ) ke ^ be 

the sequence generated by DFO E CM. Denote by f opt the optimal function value of (|3.1I) . Then for 
every n G {4, 5, . . .} 



mm 

where 



{mm(/^)-/ opt ), e }<^, 



/© r , l+ln(2) 

C = 2W-max{fCi,K- 2 } -^ + k 2 n, 

V oc 2 — v 2 

K"i = max {Ly,L^} , 
K 2 = K(\ + y/m~M/2), 



Q. = maxlx — y\, 

x,yeX 

Lf and L g are as defined in (|3.2I) . K is as defined in Corollary \3.4\ andM > satisfies that \L7 \ < 
M for alike {1,2,...}. 
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Proof. Using assumption A4, suppose that x op t is an optimal solution of (I3.ll ). Fix n E {1,2, ... }, 
and k E {1,2, ... ,n}. We begin by considering the following two cases: 

Case I: k E 1%. Then g(x k ) < £, and, by (1431 ). (1331) , and (l3~4l ) we have e fc := e(x k ) = v f (x k ) E 
df{xk) and E k := Zs(jCjt) = Vf(x k ), and hence 

/(**) < /C*opt) + -*opt)- 

Therefore, using Cauchy-Schwarz inequality and the error bound in equation (13.51 ) 

/(**) < f(Xapt) + (Ek,X k - X opt ) + (e k - E k ,X k - X opt ) 

< /(*opt) + (E k ,x k — x opt ) + \e k -E k \\x k - x op t | 

< /(x op t) + (E k ,x k -x opt ) + K 2 A k CI. 

Hence 

/(**) - /(*opt) < (E k ,x k - x opt ) + K 2 A k Cl. (4.8) 

Case II: k £ 1%, Then g(x k ) > e. Using (|4~3l) . (|33l) . and (l3~4l) we have e k = v g (x k ) E dg(x k ) and 
E k = V g (x k ), and hence 

g(x k ) < g(x opt ) + (e k ,x k -x opt ). 

Since g(x op t) < we have 

<g(*bpt) + (cjt»*t--xbpt) 

< (ek, x k— x pt) = (E kl x k — x op i) + (e k — E kl x k — x opt ). 

Hence, using Cauchy-Schwarz inequality, the assumption that \L7 1 1 < M for all k E { 1 , 2, . . .}, 
and the error bound in equation (13.51) we have 

£ < (E kl x k — X op t) + |ejfc — Ek\ \Xk — x opt I 
< (E k , x k - ^pt) + K 2 A k Cl. (4.9) 

By combining Case I and Case II, we have 

(E k ,x k -x opt ) + K 2 A k Cl> j/teWW' * (4.10) 

Using (I4T01 we have for all 1 < / < n, with A, < 1 /y/T+l 

min{min (f(x k ) - f(x opt )) ,£} < (E u xi -x op t) + k 2 A/ £L 

Let «o £ { 1 j 2, . . . , n}, then using (14.11) 

min <^ min (f(x k ) - /(x op t)) , £ f < min {{E h xi~ x opt ) + k 2 A/ a) 

l^e/„ e J n <l<n 

< min I (,E/,x/ — x op t) + K"2 ^ max A/ I 

riQ<l<n \ no<l<n J 

K 2 CI 

< min ((E h xi- Xopt)) + —===. (4.11) 
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Substituting u = x op t, i = n$, j = n in Lemma [4721 we see that 

£ h{E k ,x k -x 0Vi ) < © + ^ £ ^|^| 2 - (4-12) 

k=riQ k=n.Q 

On the other hand, since X is not a singleton, we have t k > for every & G {1, 2, . . . ,n}, and thus 

n / \ n 

£ t k {E k ,x k -Xopt) > ( min (E k ,x k -x ovt ) ) £ fc. (4.13) 
Combining (14.121) and (14.131) yields 



*2|T7 |2 



min <JS*,Jcjt— JCqpt) < — • ( 4 - 14 ) 

n[)<k<n ^— , 

£ 'a 

k=riQ 

Using (|4.4I) . we have 

£^ 2 |£,| 2 = 0a£ I (4.15) 

and 

£^ = ^£-4/=. (4.16) 

We recall that (L^ 1 ) <MforallfcG {1,2,...}, JCi = max {L f ,L g } and K 2 = K(l + y/inM/2). Now, 
for every k G {1,2,...} using Corollary 13.41 and (14.11) we have 



\/~k 

\E k \\^<(Ki + K 2 A k )Vk<KiVk+K 2 



Vk + T 

< K\\fk+K 2 < max { K\ , k 2 } (Vk+1) 

< 2max{fCi,K- 2 }v / ^. (4.17) 



Using (14.161) and (14.171) we get 



ft "-2max{K 1 ,K 2 }^ o ^' 



Using equations (14.151) and (14.181) . inequality (14.141) becomes 



1 



20 max {k u k 2 } I I + 5 £ - 
min -x opt ) < } : k=n ° 7 ■ (4.19) 

no<l<n r— — ^ 1 

V0a £ -= 

fc=no 
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Now, set no = [n/2\ . On the one hand, using (14.191 ) and Lemma lATTI we get 

min (E h xi-xapt) < —=, (4.20) 

n <l<n y/n 

where C\ = 2<i/|^rnax{K'i, ?c 2 } - ~^ n ^ , On the other hand, using the fact that [n/2\ + 1 > n/2 
* 2 — v 2 

we have 

L — z - < _ i (4 21) 

x/^TT ^L«/2J+1 " v^' 

where C 2 = k 2 Q. Using (14.201) and (14.211) we deduce that 

C\ -\- C 2 c 

min{min/(x fc ) -/(x opt ),e} < 1 r 2 = (4.22) 



which completes the proof. □ 

5 Numerical Results 

In this section we provide some numerical results of the DFO e CM algorithm. The DFO e CM al- 
gorithm was implemented in MATLAB. To begin we examine three academic test problems from 
HOldTl. We then apply the DFO e CM algorithm to a simulation test problem from lfT6ll . 



5.1 Academic Test Problems 

We first consider three academic test problems from ifTOl [TTTl . In working with these problems, 
we rewrite the constraint functions as a single constraint via a max function. For example, in 
Test Problem 1 the constraint functions are rewritten as g(xi,X2) = max gi(x), where gi(xi,x 2 ) = 

1<;<3 

-*i,g 2 (xi,x 2 ) =xi-l andg 3 (xi,x 2 ) =x 2 . 
(i) Test Problem 1 

(jc G l 2 ) Minimize — x\ — 2x2 
subject to < x\ < 1 
* 2 <0. 



(ii) Test Problem 2 

(x G M 2 ) Minimize 6x\ +x\- 60x\ - 8jc 2 + 166 
subject to < x\ < 10, 
< x 2 < 10, 

X\ +X2 ~X\X2 > 0, 

x\+X2 — 3>0. 
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(iii) Test Problem 3 



(x G K 2 ) Minimize Ixj + 3x\ - 84* i - 3Ax 2 + 300 
subject to < x\ < 10, 
< x 2 < 10, 

X\X2 — 1 > 0, 

9-x\-x\ >0. 

Remark 5.1. In [ITOl and [fTTTl . authors mention that their algorithms could not find an opti- 
mal solution to Test Problem 3. This is due to them incorrectly stating that the optimal value is 
—97.30952. The correct optimal value is / Q pt ~ 84.6710, which we demonstrate below. 
Define f, g\, and g2 as follows, 

f(xi,x 2 ) = lx\ + 3x1 ~ 84 *i -34jc 2 + 300, 
gi(xi,X2) = 1 — x\X2 < 0, and 
g 2 (xi,x 2 ) = <0. 

Notice that f(x\ : X2) = 7(x\ — 6) 2 + 3(x2 — -y ) 2 — so f is strictly convex. 
The constraint set {(jci,^) G IR 2 : < x\ < 10, < X2 < 10, ^1(^1,^2) < anJ < g2(^i,- , C2) < 0} w 
a/50 convex Let a be the positive real root of p(x) = I6x 4 — 336x 3 + 1909x 2 + 3024x — 15876. 
Then at x\ = a, and X2 = ^fy^ 3 — y^a 2 + Tjf « + 77, with X = —1 — 37^6! + |a 2 — -jj^a 3 we have 
1 — X1X2 < 0, x\ +X2 = 9 and Vf(xi,X2) = XVg2(xi,X2); that is first order optimality holds. As the 
objective function and constraint set are convex, this implies optimality. The corresponding optimal 
value is / Q pt ~ 84.6710. Approximate values 0/(^1,^2) = (2.6390, 1.4267) and X ~ —8.9150. 

We test DFO e CM on each of these three test problems using two options for creating the 
Bregman distance. In the results of these test problems we shall use (0\ = \\-\ 2 , and (^{x) to 
denote the (negative) entropy Y4L1 (•*»') InC*/)- m Tabled] we compare our results of the first three 
test problems to the results obtained by the Pattern Search method and Simplex Search method 
introduced in IfTOl . Note that, although in test problems 2 and 3 the constraint functions are non 
convex, the generated constraint set is convex. This is not covered by Theorem 14. 3 [ however; the 
DFO e CM still gives a good fit. 

Examining Table [H we note that DFO e CM outperformed both the Pattern Search and Simplex 
Search algorithms on Test Problems 2 and 3. On Test Problem 1, DFO e CM did not preform as 
well, but still required noticeably less function evaluations that the Pattern Search and Simplex 
Search methods. 

5.2 Simulation Test Problem 

In this section we test the algorithm on 12-dimensional simulated maximization problem given in 
lfT6l . We used the same staring points given in [16J: xq = (1,0, ...,0) and ^0 = (2,0.5, ... ,0.5) 
are vectors in IR 12 . The results are reported in Table [2l We compare our results to the results 
obtained from the Direct Pattern Search Method (DPS) and the Direct Random Search Method 
with Simulated Annealing (DRS+SA) in ifTBl . As the constraint set for this problem is a system of 
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Table 1: Comparing results for Test Problems 1, 2, and 3. 



Test Problem 


Results 


DFO CoMirror 


Pattern Search 


Simplex Search 






0)1 (x) 


0)2 (x) 


AlsorithmlflOl 

x 11£1W1 I 11 1111 1 A. V / 1 


AlsorithmlflOll 

x 1.1C1W11L11111I A. V / 1 


i 

1 


T-n inr*ti nil ^/qmi^ 
.TUllCLltJll Value 


-0.9542 


-0.9645 


-1 
1 


1 




/ CVulUULlUlld 


/ o 


QQ 

yy 


195 


1 58 




g evaluations 


162 


141 


157 


129 


2 


Function value 


7.5587 


7.5580 


7.625 


7.625 




/ evaluations 


78 


81 


138 


146 




g evaluations 


122 


111 


138 


118 


3 


Function value 


84.7096 


84.7108 


85.6610 


85.6200 




/ evaluations 


78 


75 


154 


198 




g evaluations 


122 


125 


154 


153 



linear inequalities, the methods used in [16] used exact gradients when dealing with constraints. 
Objective function evaluations are provided via deterministic simulation. 

The results in lfT6l report that, using 3000 function calls, the DPS gives an optimal value of 
0.8327 with xq as starting point and an optimal value of 0. 1747 with xq as starting point. Whereas, 
using 3000 function calls, the heuristic DRS+SA gives an optimal value of 0.9628 with xq as 
starting point and an optimal value of 0.9671 with xq as starting point. 



Table 2: Results of DFO CoMirror algorithm 







Starting point xq 






Starting point xq 




f calls 


£ = 0.01 


£ = 0.005 £ 


= 0.001 


£=0.01 


£ = 0.005 £ 


= 0.001 


100 


0.7329 








0.8875 





0.8968 


500 


0.9400 


0.9387 


0.9342 


0.9220 


0.9210 


0.8332 


1000 


0.9452 


0.9514 


0.9447 


0.9277 


0.9256 


0.9334 


3000 


0.9547 


0.9551 


0.9546 


0.9500 


0.9467 


0.9538 



In Table [2] we see that with 500 function calls, DFO e CM is able to achieve a significantly 
better fit than the DPS. While the fit for DFO e CM never quite achieves the quality of the DRS+SA 
method, it comes quite close after 3000 function calls. This difference could be explained by the 
fact that the DRS+SA method employs heuristics to break free of local minimizers. 

6 Conclusion 

In this paper we developed the convergence analysis required to generate a derivative-free comirror 
algorithm, DFO e CM. Furthermore, we provided some numerical results from the implementation 
of the algorithm in MATLAB. One natural line of future research is to adapt the algorithm to deal 
with the problem 

(Pl):min{/(x):g(x)<0}, (6.1) 

i.e., X = W 11 , and to prove convergence. Another line of future research is examining the conver- 
gence in the case where g is not necessarily convex, but the constraint set remains convex. Results 
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from test problems 2 and 3 suggest that this is possible. 



A Appendix 

Lemma A.l. For any integer n G {4, 5, . . . } the following inequalities hold true 

t T<2hi(2), (A.l) 

k=[n/2\ k 

£ i=>(2-v / 2)v / ^. (A.2) 

k=\n/2\ V« 

Proof. To see inequality (IA. II) . notice 

k=[n/2] K k=[n/2\-\ Jk X 

-dx 

\n/2\-\ X 



^ll^PlJ' <A ' 3) 

We now consider two cases (n is even and n is odd). Case I: suppose n = 2m with m G {1,2,...}. 
Then 

n 2/77 

— — < 4 < 4 <=^ n = 2m > 4. (A.4) 

|_ft/2J — 1 m — 1 

Case II: suppose n = 2m + 1 with m G {1,2,...}. Then 

n 2m + 1 

<4 <4 n = 2m+l > 7. (A.5) 



[rc/2j - 1 ~ m - 1 

n 

Moreover, for n = 5 direct computation shows that - — — < 4, which together with (IA.3I) , 

[n/2j - 1 

(IA.4I ) and (IA.5I ) proves the first inequality for all n G {2, 3, . . . }. 
Finally, 

n 1 B 1 « rk+l 1 /-n+1 1 

> / —=dx = (2 — v 2)y/n. 



which proves inequality (IA.2I) □ 
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